Learning Objectives

This section covers the following topics:

  • Creation of sf::LINESTRING and sf::MULTIPOLYGON objects from raw data.

  • Loading map polygons from get_data()

  • Importing data from ESRI shapefile.

  • Implementing Spatial Joins.

Data Sources: De novo Creation

It is possible to make data from scratch by entering coordinates directly.

df <- data.frame( id = c(rep("Rodney",5), rep("Sarah",5)),
                  Data = rnorm( 10, 42, 42),
                  Longitude = rnorm(10, -78, 1 ),
                  Latitude = rnorm(10, 37, 1) )
df
       id       Data Longitude Latitude
1  Rodney  13.949899 -78.44814 38.24930
2  Rodney  -7.099641 -77.75363 36.93811
3  Rodney 107.714647 -78.83435 36.46239
4  Rodney  65.505546 -77.10051 37.35242
5  Rodney   1.501342 -77.77865 36.81856
6   Sarah  42.510296 -78.15033 37.89910
7   Sarah  26.866653 -79.56341 37.62405
8   Sarah  80.559394 -78.37316 37.35191
9   Sarah  50.153114 -76.93191 38.07141
10  Sarah -23.997563 -74.76189 36.84952

Data Sources: De novo Creation

To Make to sf as POINT objects as normal.

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) -> pts
pts
Simple feature collection with 10 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -79.56341 ymin: 36.46239 xmax: -74.76189 ymax: 38.2493
Geodetic CRS:  WGS 84
       id       Data                   geometry
1  Rodney  13.949899  POINT (-78.44814 38.2493)
2  Rodney  -7.099641 POINT (-77.75363 36.93811)
3  Rodney 107.714647 POINT (-78.83435 36.46239)
4  Rodney  65.505546 POINT (-77.10051 37.35242)
5  Rodney   1.501342 POINT (-77.77865 36.81856)
6   Sarah  42.510296  POINT (-78.15033 37.8991)
7   Sarah  26.866653 POINT (-79.56341 37.62405)
8   Sarah  80.559394 POINT (-78.37316 37.35191)
9   Sarah  50.153114 POINT (-76.93191 38.07141)
10  Sarah -23.997563 POINT (-74.76189 36.84952)

Plotting Points

We can easily mix individual aesthetic properties to change the representaion of the points provided by using geom_sf().

 

ggplot( pts ) + 
  geom_sf( aes(color = id, size=Data)) + 
  coord_sf() + 
  theme(axis.text.x = element_text(angle = 45, 
                                   vjust = 1, 
                                   hjust=1))

 

 

Data Sources: De novo Creation

To make sf as LINESTRING we can group by the id value and then we need to summarize the Data component, then cast it to a Linestring.

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) %>%
  group_by( id ) %>%
  summarize( Data = mean( Data ) ) %>%
  st_cast("LINESTRING") -> lines 

 

lines
Simple feature collection with 2 features and 2 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -79.56341 ymin: 36.46239 xmax: -74.76189 ymax: 38.2493
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
  id      Data                                                          geometry
  <chr>  <dbl>                                                  <LINESTRING [°]>
1 Rodney  36.3 (-78.83435 36.46239, -77.10051 37.35242, -77.75363 36.93811, -77…
2 Sarah   35.2 (-79.56341 37.62405, -78.37316 37.35191, -78.15033 37.8991, -76.…

Visualizing Line Objects

Just like when we were working with POINT objects, we can use both built-in graphing approaches as well as ggplot() for visualizing. Here I’m goint to plot the two line objects using normal graphics.

plot( lines["Data"], 
      col=c("red","green"),
      lwd = 2)

 

 

Operations on LINESTRING Objects

Physical length of the LINESTRING objects

st_length( lines )
Units: [m]
[1] 440339.0 516527.8

The bounding box around all the lines.

st_bbox( lines )
     xmin      ymin      xmax      ymax 
-79.56341  36.46239 -74.76189  38.24930 

The area of the convex hull encompassing all the lines.

library( units )
lines %>% 
  st_union() %>% 
  st_convex_hull() %>% 
  st_area() %>%
  set_units( km^2 )
50553.2 [km^2]

Textual Versions of geometry

While it is easy to go from text \(\to\) POLYGON, the same thing can be done going in the opposite direction.

lines$geometry[1] %>% st_as_text()
[1] "LINESTRING (-78.83435 36.46239, -77.10051 37.35242, -77.75363 36.93811, -77.77865 36.81856, -78.44814 38.2493)"

Polygons

Polygons are simply lines whose first and last coordinate are exactly the same

df.p <- df[ c(1:5,1, 6:10,6),]
df.p
        id       Data Longitude Latitude
1   Rodney  13.949899 -78.44814 38.24930
2   Rodney  -7.099641 -77.75363 36.93811
3   Rodney 107.714647 -78.83435 36.46239
4   Rodney  65.505546 -77.10051 37.35242
5   Rodney   1.501342 -77.77865 36.81856
1.1 Rodney  13.949899 -78.44814 38.24930
6    Sarah  42.510296 -78.15033 37.89910
7    Sarah  26.866653 -79.56341 37.62405
8    Sarah  80.559394 -78.37316 37.35191
9    Sarah  50.153114 -76.93191 38.07141
10   Sarah -23.997563 -74.76189 36.84952
6.1  Sarah  42.510296 -78.15033 37.89910

Polygon de novo Creation

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) %>%
  group_by( id ) %>%
  summarize( Data = mean( Data ) ) %>%
  st_cast("POLYGON") -> polygons

 

polygons
Simple feature collection with 2 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -79.56341 ymin: 36.46239 xmax: -74.76189 ymax: 38.2493
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
  id      Data                                                          geometry
  <chr>  <dbl>                                                     <POLYGON [°]>
1 Rodney  36.3 ((-78.83435 36.46239, -77.10051 37.35242, -77.75363 36.93811, -7…
2 Sarah   35.2 ((-79.56341 37.62405, -78.37316 37.35191, -78.15033 37.8991, -76…

Visualizing Polygon Objects

ggplot( polygons ) + 
  geom_sf( aes( fill = Data ), alpha=0.75 ) + coord_sf() + 
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))

Polygons From Data Frames

map_data( "county", "virginia") %>%
  dplyr::select( Longitude = long,
                 Latitude = lat,
                 group,
                 County = subregion) -> va_counties
head( va_counties )
  Longitude Latitude group   County
1 -75.27519 38.03867     1 accomack
2 -75.21790 38.02721     1 accomack
3 -75.21790 38.02721     1 accomack
4 -75.24655 37.99283     1 accomack
5 -75.30384 37.94127     1 accomack
6 -75.31530 37.92981     1 accomack

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_point( size=0.25 ) + 
  coord_quickmap()

va_counties %>%
  filter( County %in%  c("hanover","henrico") ) %>%
  ggplot( aes(Longitude, Latitude) ) + 
  geom_polygon( aes( fill = County), alpha=0.1 ) +
  geom_point( aes( color = County) ) +
  coord_quickmap()

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_polygon( aes( group=group ), fill="grey80",
                color = "black", linewidth = 0.25) + 
  coord_quickmap()

ggplot() + 
  geom_polygon( aes( Longitude, Latitude, group=group ),
                fill="grey80",color = "black", 
                size = 0.25, data=va_counties) +
  geom_sf( aes( color=Data), 
           lwd=3, data=lines) + 
  coord_sf()

Shapefiles

The ESRI Shapefile is a non-standardized format for packaging vector based data (POINT, LINESTRING, POLYGON, etc.).

However, it is not actually a file but it is a collection of files, which may (or may not) expand directly in the folder or within a subfolder.

Archived ZIP Files.

I’ve uploaded some shapefiles to the Github Repository for this class. These are from the Richmond City GIS Department and represent the centerlines of roads in the city as well as the zoning districts.

Here are the URL’s for both of these files.

roads_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Centerlines-shp.zip"
district_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Zoning_Districts-shp.zip"

Downloading Files Using R

You can use your browser and copy and paste those addresses in to download the files to your computer. Then you’ll have to move those archives to your PROJECT FOLDER (you are using Projects right?).

OR

You can have R download the file and save it locally.

download.file( district_url , destfile = "./Districts.zip")
download.file( roads_url, destfile =  "./Roads.zip")

Unziping Files

You can open your computer finder and have the OS unzip the archives.

OR

You can have R do it.

unzip("Districts.zip")
unzip("Roads.zip")

File Components

The Projection File

Loading A Shapefile

You can load it in using sf::st_read() and passing it the .shp file path to it.

st_read("Zoning_Districts.shp") -> districts 
Reading layer `Zoning_Districts' from data source 
  `/Users/rodneydyer/Documents/Teaching/Shapefiles/Zoning_Districts.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 634 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11743500 ymin: 3687944 xmax: 11806060 ymax: 3744741
Projected CRS: NAD83 / Virginia South (ftUS)

Shapefiles Convert to sf Objects

By default, the data associated with a vector object are put into a data.frame and the geometry object is properly created.

head( districts, n=2 )
Simple feature collection with 2 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11773600 ymin: 3730159 xmax: 11789510 ymax: 3731016
Projected CRS: NAD83 / Virginia South (ftUS)
  OBJECTID Name Ordinance OrdinanceP Conditiona AdoptionDa Comment
1        1 RO-2      <NA>       <NA>         No 2000-01-01    <NA>
2        2  B-2      <NA>       <NA>         No 2000-01-01    <NA>
           CreatedBy CreatedDat             EditBy   EditDate
1 richard.morton_cor 2020-08-24 richard.morton_cor 2020-08-24
2 richard.morton_cor 2020-08-24 richard.morton_cor 2020-08-24
                              GlobalID Shape__Are Shape__Len
1 334799f0-fe38-46bf-97c2-260f5a036559   60150.29   983.6815
2 558df9cd-4f9c-4248-a689-bc2d9c79d060   56987.01   971.8832
                        geometry
1 MULTIPOLYGON (((11773598 37...
2 MULTIPOLYGON (((11789222 37...

Simplifying Data

names( districts )
 [1] "OBJECTID"   "Name"       "Ordinance"  "OrdinanceP" "Conditiona"
 [6] "AdoptionDa" "Comment"    "CreatedBy"  "CreatedDat" "EditBy"    
[11] "EditDate"   "GlobalID"   "Shape__Are" "Shape__Len" "geometry"  

Simplifying Data

districts %>% 
  dplyr::select( -Comment, -CreatedBy, -CreatedDat, -EditBy, -EditDate ) %>%
  dplyr::select( -Shape__Are, -Shape__Len) -> districts
head( districts, n=1)
Simple feature collection with 1 feature and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11773600 ymin: 3730159 xmax: 11773940 ymax: 3730493
Projected CRS: NAD83 / Virginia South (ftUS)
  OBJECTID Name Ordinance OrdinanceP Conditiona AdoptionDa
1        1 RO-2      <NA>       <NA>         No 2000-01-01
                              GlobalID                       geometry
1 334799f0-fe38-46bf-97c2-260f5a036559 MULTIPOLYGON (((11773598 37...

District Codes

districts$Name %>% unique() %>% sort()
 [1] "B-1"     "B-2"     "B-3"     "B-4"     "B-5"     "B-6"     "B-7"    
 [8] "CM"      "DCC"     "I"       "M-1"     "M-2"     "OS"      "R-1"    
[15] "R-2"     "R-3"     "R-4"     "R-43"    "R-48"    "R-5"     "R-53"   
[22] "R-6"     "R-63"    "R-7"     "R-73"    "R-8"     "R-MH"    "RF-1"   
[29] "RF-2"    "RO-1"    "RO-2"    "RO-3"    "RO2-PE2" "RO2-PE4" "RP"     
[36] "TOD-1"   "UB"      "UB-2"    "UB-PE2"  "UB-PE3"  "UB-PE4"  "UB-PE5" 
[43] "UB-PE6"  "UB-PE7"  "UB-PE8"  "UB-PO1"  "UB-PO2"  "UB-PO3"  "UB-PO4" 
[50] "UB2-PE8"

Visualizing

plot( districts["Name"] )

Visualizing Overlays

plot( st_geometry( districts ))
plot( st_geometry( districts[districts$OBJECTID==530,] ), 
      col='red', add=TRUE )

Spatial Joins

In the topic covering [Relational Operations], we used Primary and Foreign Keys to join data.frame objects and combine data. We can do simliar operations using geospatial positions to perform spatial joins.

There are a wide variety of operations available:

Check out the Cheetsheet

Secondary Vector Data Set

road.shapefile <- st_read("Centerlines-shp/tran_Carriageway.shp")
Reading layer `tran_Carriageway' from data source 
  `/Users/rodneydyer/Documents/Teaching/Shapefiles/Centerlines-shp/tran_Carriageway.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 29081 features and 25 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 11734060 ymin: 3682790 xmax: 11817490 ymax: 3751927
Projected CRS: NAD83 / Virginia South (ftUS)
names( road.shapefile )
 [1] "FID"        "Carriagewa" "AssetID"    "StreetType" "Functional"
 [6] "FIPS"       "LeftFromAd" "LeftToAddr" "RightFromA" "RightToAdd"
[11] "PrefixDire" "ProperName" "SuffixType" "SuffixDire" "FullName"  
[16] "RouteName"  "OneWay"     "PostedSpee" "CADRouteSp" "CreatedBy" 
[21] "CreatedDat" "EditBy"     "EditDate"   "GlobalID"   "SHAPE_Leng"
[26] "geometry"  

Check Projections

Dyer’s First Rule: Make Sure Projections Are the Same

st_crs( road.shapefile ) == st_crs( districts )
[1] TRUE

If they differed, one could do something like:

road.shapefile %>%
  st_transform( crs = st_crs( districts ) ) -> road.shapefile

Data Cleanup

road.shapefile %>%
  dplyr::select( FullName, OneWay, StreetType,
                 SpeedLimit = PostedSpee, Length = SHAPE_Leng,
                 geometry) %>%
  mutate( OneWay = factor( OneWay ),
          StreetType = factor( StreetType) ) -> roads
summary( roads )
   FullName          OneWay                    StreetType      SpeedLimit   
 Length:29081       1   :    2   Alley              : 3139   Min.   : 0.00  
 Class :character   FT  : 3225   Artery             : 5201   1st Qu.:25.00  
 Mode  :character   TF  : 3148   Highway            :  384   Median :25.00  
                    NA's:22706   Highway Interchange:   93   Mean   :25.17  
                                 Private            : 1540   3rd Qu.:25.00  
                                 Ramp               :  378   Max.   :55.00  
                                 Secondary          :18346   NA's   :7959   
     Length                   geometry    
 Min.   :    2.943   LINESTRING   :29081  
 1st Qu.:  164.849   epsg:2284    :    0  
 Median :  292.517   +proj=lcc ...:    0  
 Mean   :  379.338                        
 3rd Qu.:  461.146                        
 Max.   :17851.326                        
                                          

Visualizing: Filter Highways & Plot

roads %>%
  filter( StreetType %in% c("Highway", 
                            "Highway Interchange",
                            "Ramp")) -> highways
summary( highways )
   FullName          OneWay                  StreetType    SpeedLimit   
 Length:855         1   :  0   Alley              :  0   Min.   : 0.00  
 Class :character   FT  :416   Artery             :  0   1st Qu.:25.00  
 Mode  :character   TF  :408   Highway            :384   Median :25.00  
                    NA's: 31   Highway Interchange: 93   Mean   :36.19  
                               Private            :  0   3rd Qu.:55.00  
                               Ramp               :378   Max.   :55.00  
                               Secondary          :  0   NA's   :131    
     Length                  geometry  
 Min.   :   14.46   LINESTRING   :855  
 1st Qu.:  276.48   epsg:2284    :  0  
 Median :  726.13   +proj=lcc ...:  0  
 Mean   : 1055.15                      
 3rd Qu.: 1170.35                      
 Max.   :17851.33                      
                                       

Plot It

Let’s color this based upon SpeedLimit

Visualizing A Single Entity

Using Normal Filtering

roads %>% 
  filter( FullName == "Three Chopt Road") %>%
  ggplot() + 
  geom_sf( aes(color=SpeedLimit), 
           lwd=2) + 
  coord_sf() + theme(axis.text.x = element_text(angle = 45) )

Intersection Plotting

Let’s grab a bit of zoning from The Fan

districts %>%
  filter( OBJECTID == 368 ) %>%
  st_buffer(dist = 1500) %>%
  st_bbox() -> fan_bbox

districts %>%
  st_crop( fan_bbox ) -> theFan 

plot( theFan["Name"])

Add Auxillary Data

zone_url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/DistrictCodes.csv"

theFan %>%
  left_join( read_csv( zone_url ),
             by="Name") %>%
  mutate( Category = factor( Category) ) %>%
  select( OBJECTID, 
          Name, 
          Category, 
          everything() )  -> theFan

 

ggplot( theFan ) + geom_sf( aes( fill=Category)) + 
  scale_fill_brewer( type="qual", palette = "Set3") + theme_void()

Cropping Roads

roads %>% st_crop( st_bbox( theFan )) -> fanRoads
plot( st_geometry( fanRoads ))

Selecting a Single Zone

Add an attribute to the POLYGON’s indicating if the district is that big one in the middle of the plot. Let’s plot it to see what the ID is.

theFan %>%
  mutate( Target = ifelse( OBJECTID == 368, 
                           TRUE, 
                           FALSE) ) -> theFan

Plot it

theFan %>%
  ggplot() + 
  geom_sf( aes(fill=Target) ) + 
  geom_sf_text( aes(label=OBJECTID), size=3 ) +
  coord_sf() + theme_void()

Isolate the District

target <- theFan[ theFan$OBJECTID == 368, ]
plot( st_geometry( target ) ) 

 

fanRoads %>%
  filter( st_intersects( fanRoads, target, 
                         sparse = FALSE ) == TRUE ) %>%
  as_data_frame() %>% select( `Street Name` = FullName ) %>%
  arrange( `Street Name`) %>% unique() 
# A tibble: 39 × 1
   `Street Name` 
   <chr>         
 1 Allison St    
 2 Birch St      
 3 Boyd St       
 4 Floyd Ave     
 5 Grove Ave     
 6 Hanover Ave   
 7 Kensington Ave
 8 Lombardy Pl   
 9 Madumbie Lane 
10 Monument Ave  
# ℹ 29 more rows

 

fanRoads %>%
  filter( st_intersects( fanRoads, target, 
                         sparse = FALSE ) == TRUE ) %>%
  ggplot() +
  geom_sf( data=target  ) +
  geom_sf( color="darkgreen" ) + theme_void()

Questions

If you have any questions, please feel free to either post them as an “Issue” on your copy of this GitHub Repository, post to the Canvas discussion board for the class, or drop me an email.

Peter Sellers looking bored